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Abstract 

Analyzing football score data with statistical techniques, we investigate how the not purely 
random, but highly co-operative nature of the game is reflected in averaged properties such as the 
probability distributions of scored goals for the home and away teams. As it turns out, especially the 
tails of the distributions are not well described by the Poissonian or binomial model resulting from 
the assumption of uncorrelated random events. Instead, a good effective description of the data is 
provided by less basic distributions such as the negative binomial one or the probability densities of 
extreme value statistics. To understand this behavior from a microscopical point of view, however, 
no waiting time problem or extremal process need be invoked. Instead, modifying the Bernoulli 
random process underlying the Poissonian model to include a simple component of self-affirmation 
seems to describe the data surprisingly well and allows to understand the observed deviation from 
Gaussian statistics. The phenomenological distributions used before can be understood as special 
cases within this framework. We analyzed historical football score data from many leagues in 
Europe as well as from international tournaments, including data from all past tournaments of 
the "FIFA World Cup" series, and found the proposed models to be applicable rather universally. 
In particular, here we analyse the results of the German women's premier football league and 
consider the two separate German men's premier leagues in the East and West during the cold 
war times and the unified league after 1990 to see how scoring in football and the component of 
self-affirmation depend on cultural and political circumstances. 
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I. INTRODUCTION 



Football is perhaps the most popular sports in Europe, attracting millions of spectators 
and involving thousands of players each year. As a traditional socio-cultural institution of 
significant economical importance, football has also been the subject of numerous scientific 
efforts, for instance geared towards the improvement of game tactics, the understanding 
of the social effects of the fan scene etc. Much less effort has been devoted, it seems, to 
the understanding of football (and other ball sports) from the perspective of the stochastic 
behavior of co-operative "agents" (i.e., players) in abstract models. This problem as well as 
many other topics relating to the statistical properties of socially interacting systems have 
recently been identified as fields where the model-based point-of-view and methodological 
machinery of statistical mechanics might add a new perspective to the much more detailed 
investigations of more specific disciplines 

Score distributions of football and other ball games have been occasionally considered by 
mathematical statisticians for more than fifty years . Initially, the limited 

available data were found to be reasonably modeled by the Poissonian distribution resulting 
from the simplest assumption of a completely random process with a fixed (but possibly 
team dependent) scoring probability P]- In the following, it was empirically found that a 
better fit could be produced with a negative binomial distribution originally introduced as 
an ad hoc measure of generalizing the parameter range for fitting certain biological data 
j^. The negative binomial form occurs naturally for a mixture of Poissonian processes 
with a certain^istribution of (independent) success probabilities Furthermore, recently 
it was found 81 that score distributions of some football leagues are better described by 

n 

the generalized distributions of extreme value statistics [10||, while others rather follow the 
negative binomial distribution. This yielded a rather inhomogeneous picture and, more 
generally, for a system of highly co-operative entities it might be presumed that models 
without correlations cannot be an adequate description. What is more, all these proposals 
remained in the realm of observation, since the considered statistical models where selected 
by best fit, without offering any microscopical justification for the choice. 

The distribution of extremes, i.e., the probability density function of (k^^) maximal or 
minimal values of independent realizations of a random variable, is described by only a few 
universality classes, depending on the asymptotic behavior of the original distribution jlfll |. 
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Apart from the direct importance of the problem of extremes in actuarial mathematics and 
engineering, generalized extreme value (GEV) distributions have been found to occur in such 



diverse systems as the statistical mechanics of regular and disordered systems 



15|, [l^, turbulence or earth quake data 
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18|. However, in most cases global properties 



were considered instead of explicit extremes, and the occurrence of GEV distributions led to 
speculations about hidden extremal processes in these systems, which could not be identified 
in most cases, though. It was only realized recently that GEV distributions can also arise 
naturally as the statistics of sums of correlated random variables 2Q, 21|, which could 
explain their ubiquity in physical systems. 

For the problem of scoring in football, correlations naturally occur through processes 
of (positive and negative) feedback of scoring on both teams, and we shall see how the 
introduction of simple rules for the adaptation of the success probabilities in a modified 
Bernoulli process upon scoring a goal leads to systematic deviations from Gaussian statistics. 
We find simple models with a single parameter of self-affirmation to best describe the 
available data, including cases with relatively poor fits of the negative binomial distribution. 
The latter is shown to result from one of these models in a particular limit, explaining the 
relatively good fits observed before. For the models under consideration, exact recurrence 
relations and precise closed-form approximations of the probability density functions can 
be derived. Although the limiting distributions of the considered models in general do not 
follow the statistics of extremes, it is demonstrated how alternative models leading to GEV 
distributions could be constructed. The best fits are found for models where each extra goal 
encourages a team even more than the previous one: a true sign of football fever. 

The rest of the paper is organized as follows. Section |Tll discusses the probability distri- 
butions used by us and previous authors to fit football score data and their relations to the 
microscopic models introduced here. The results of fits of the considered models and distri- 
butions to the data are summarized and discussed in Sec. IIIII with emphasis on a comparison 
of the goal distributions in the divided Germany of the cold war times and of the German 
women's and men's premier leagues, and an analysis of the results of the "FIFA World Cup" 
series. Finally, Sec. IIVI contains our conclusions, and some of the statistical technicalities of 
the considered modified binomial models are summarized in App. ^ 
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II. PROBABILITY DISTRIBUTIONS AND MICROSCOPIC MODELS 



The most obvious and readily available global property characterizing a football match is 
certainly given by the overall score of the game. Hence, to investigate the balance of chance 
and skill in football jsl, here we consider the distributions of goals scored by the home and 
away teams in football league or cup matches. To the simplest possible approximation, both 
teams have independent and constant probabilities of scoring during each appropriate time 
interval of the match, thus degrading football to a pure game of chance. Since the scoring 
probabilities will be small, the resulting probabilities of final scores will follow a Poissonian 
distribution, 

^A\K) = ^exp(-Ah), P,^„K) = ^exp(-Aa), (2.1) 

where Uh and are the final scores of the home and away teams, respectively, and the 
parameters Ah and Aa are related to the average number of goals scored by a team, A = (n) . 
As an additional check of the fit to the data, one might then also consider the probability 
densities of the sum a = + and difference 5 = nh — na of goals scored, 

PlM = jlPl»nS^ -n) = t^^'^ exp[-(Ah + Aa)], 



n=0 



(2.2) 



n=0 

where Is is the modified Bessel function (see 



22j, p. 374). Note that Px^^^xM) is itself a 



Poissonian distribution with parameter A = Ah + Aa- 

Clearly, the assumption of constant and independent scoring probabilities for the teams 
is not appropriate for real-world football matches. Since we are interested in averages over 
the matches during one or several seasons of a football league or cup, one might expect 
a distribution of scoring probabilities A depending on the different skills of the teams, the 
lineup for the match, tactics, weather conditions etc., leading to the notion of a compound 
Poisson distribution. It can be easily shown ^ 
probabilities A following a gamma distribution, 

T 

° -y-'e-''\ A>0, 



24| that for the special case of the scoring 



/(A) = { r(r) (2.3) 
0, A < 0, 
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the resulting compound Poisson distribution has the form of a negative binomial distribution 
(NBD), 

P,,,{n) = / dA Px{n)f{X) = ^n^p"(l - p)^ (2.4) 
Jo nlT{r) 

where p = 1/(1 + a). The negative binomial form has been found to describe football score 

nn 

data rather well The underlying assumption of the scoring probabilities following 

a gamma distribution seems to be rather ad hoc, however, and fitting different seasons of 
our data with the Poissonian model ()2.1|) . the resulting distribution of the parameters A 
does not resemble the gamma form ()2.3|) . Analogous to Eq. ()2.2|) . for the negative binomial 
distribution (j2.4p one can evaluate the probabilities for the sum a and difference 6 of goals 
scored by the home and away teams, 



(2.5) 

where 2F1 is the hypergeometric function (see [2^, p. 555). Restricting ph = Pa, the distri- 
bution of the total score simplifies to Pp^r,q,si^) = Pp,r+s{cr), i-e., one finds a composition law 
similar to the case of the Poissonian distribution. 

To do justice to the fact that playing football is different from playing dice, one has to take 
into account that goals are not simply independent events but, instead, scoring certainly has 
a profound feedback on the motivation and possibility of subsequent scoring of both teams 
(via direct motivation/demotivation of the players, but also, e.g., by a strengthening of 
defensive play in case of a lead), i.e., there is a fundamental component of (positive or 
negative) feedback in the system. We do so by introducing such a feedback effect into the 
bimodal model (being the discrete version of the Poissonian model ()2.1|) above): consider a 
football match divided into time steps (we restrict ourselves here to the natural choice 

= 90, but good fits are found for any choice of A^ within reasonable limits) with both 
teams having the possibility to either score or not score in each time step. Feedback is 
introduced into the system by having the scoring probabilities p depend on the number n of 
goals scored so far, p = p{n). Several possibilities arise. For our model "A", upon each goal 
the scoring probability is modified as 

p{n) = p{n — 1) + K, (2.6) 
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with some fixed constant k (unless p{n — 1) + /t > 1, in which case p{n) = 1, or p{n — 1) + 
K < 0, which is replaced by p{n) = 0). Alternatively, one might consider a multiplicative 
modification rule, 

p{n) = Kp{n — 1) (2.7) 

(again modified to ensure < p{n) < 1), which we refer to as model "B". The resulting 
modified binomial distributions PN{n) for the total number of goals scored by one team can 
be computed exactly from a Pascal type recurrence relation, 

PN{n) = [l-p{n)]PN-iin)+p{n-l)PN-iin-l), (2.8) 

where, e.g., p{n) = pq + nn for model "A" and p{n) = Pqk"' for model "B". Eq. ()2.8p is 
intuitively plausible, since n successes in trials can be reached either from n successes in 
A^ — 1 trials plus a final failure or from n — 1 successes in A^ — 1 trials and a final success. 
For a more formal proof see the discussion in App. El where for the additive case of model 
"A", it is also demonstrated that the continuum limit of P/v(n), i.e., A^ ^ oo with poN 
and kN kept fixed, is given by the negative binomial distribution ()2.4j) with r = pq/k and 
p = 1 — e"**^ (note that this also includes the "generalized binomial distribution" considered 
in Refs. HQ). Thus the good fit of a negat.ve bi„o,„ial d.t.but.o. to the data ca. be 
understood from the "microscopic" effect of self-affirmation of the teams or players, without 
making reference to the somewhat poorly motivated composition of the pure Poissonian 
model with a gamma distribution. Finally, the assumption of independence of the scoring 
of the home and away teams can be relaxed by coupling the adaptation rules upon scoring, 
for instance as 

Ph{n) = pii{n — l)Kii, Pa{n) = pa,{n — 1) / k^, for a goal of the home team, 

(2.9) 

Ph{n) = pi^{n — l)/Kh, Pa.{n) = pa(^ — l)/^a5 for a goal of the away team, 

which we refer to as model "C". If both teams have k > 1, this results in an incentive for 
the scoring team and a demotivation for the opponent. But a value k < 1 is conceivable as 
well. The probability density function Pjv(?2h, "^a) can be computed recursively as well, cf. 
App. El 

Starting from the observation that the goal distributions of certain leagues do not seem 
to be well fitted by the negative binomial distribution, Greenhough et al. considered fits 
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of the GEV distributions, 



n,M,<x(^) = -(l + e^^j exp 



Pp,,u{n) = -exp 
cr 



n — fi\ n — fi' 



for ^^0, 

(2.10) 

exp ( 1 ^ for ^ = 0, 

a J a 

to the data, obtaining good fits in some cases. According to the value of the parameter 
^, these distributions are known as WeibuU < 0), Gumbel = 0) and Frechet > 0) 
distributions, respectively. As for the case of the negative binomial form as a compound 
Poisson distribution, the use of extremal value statistics appears here rather ad hoc. We 
would like to point out, however, that the GEV distributions indeed can result from a 
modified microscopical model with feedback. To this end, consider again a series of trials 
for a number of time steps. Assume that the probability to score IJ\ goals in time step 
1 is distributed according to P\{JJ\) = P{Ui) (e.g., with a Poisson distribution P), the 
probability to score U2 goals in time step 2 is P2(f^2) = PiPi + f/2)/^2 etc., such that 
P^iU^) = P{Y.]=iUj + U^)/Z,. For any continuous distribution P, this means that due to 
the normalization factors Zi the distribution of f/j will have enhanced tails compared to the 
distribution of (unless f/j_i = 0) etc., resulting in a positive feedback effect similar 
to that of models "A", "B" and "C". We refer to this prescription as model "D". From 
the of Be«,„ and C.use. Q H then follows that the Halting dist.but.on of 

the total score n = YliLi is a GEV distribution, where the specific form of distribution 
[in particular the value of the parameter in ()2.10j) ] depends on the falloff of the original 
distribution P in its tails. 

III. DATA AND RESULTS 

Concerning football matches played in leagues, our main data set consists of matches 
played in Germany, namely for the "Bundesliga" (men's premier league FRG, 1963/64 - 
2004/05, ^ 12 800 matches), the "Oberliga" (men's premier league GDR, 1949/50 - 1990/91, 
~ 7700 matches), and for the "Frauen-Bundesliga" (women's premier league FRG, 1997/98 



- 2004/05, ^ 1050 matches) j27|. 
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|29|,|30|. ticular 
the feedback effects reflected in the football score distributions depend on cultural and 
political circumstances and are possibly different between men's and women's leagues. We 
first determined histograms estimating the probability density functions (PDFs) P^{nh) 



and P'^{na) of the final scores of the home and away teams, respectively j31|. Similarly, 
we determined histograms for the PDFs -P^(cr) and P'^{6) of the sums and differences of 
final scores. To arrive at error estimates on the histogram bins, we utilized the bootstrap 
resampling scheme js^ . 

We first considered fits of the PDFs of the phenomenological descriptions considered 
previously, namely the Poissonian form !^2.1\i . the negative binomial distribution ()2.4j) and 
the distributions ()2.10|) of extreme value statistics. The parameters of fits of these types to 
the data are summarized in Table |l] comparing the East German "Oberliga" to the West 
German "Bundesliga" (1963/64 - 1990/91, ~ 8400 matches) during the time of the German 
division, and in Table [H] comparing the data for all games of the German men's premier 
league "Bundesliga" to the German women's premier league "Frauen-Bundesliga" . Not to 
our surprise, and in accordance with previous findings 0,1^, the simple Poissonian ansatz 
()2.1|) is not found to be an adequate description for any of the data sets. Deviations occur 
here mainly in the tails with large numbers of goals which in general are found to be fatter 
than can be accommodated by a Poissonian model, whereas the distribution peaks are 
reasonably well represented. On the contrary, the negative binomial form ()2.4|1 models all 
of the considered data well as is illustrated with fits of the corresponding form to our data 
in Fig. ^ comparing "Oberliga" and "Bundesliga" and in Fig. |21 presenting "Bundesliga" 
and "Frauen-Bundesliga". Comparing the leagues, we find that the parameters r of the 
NBD fits for the "Bundesliga" are about twice as large as for the "Oberliga", whereas the 
parameters p are smaller for the "Bundesliga" , cf. the data in Table HI Recalling that the 
form ()2.4j) is in fact the continuum limit of the feedback model "A" discussed above, these 
differences translate into larger values of k and smaller values of Pq for the "Oberliga" 
results. That is to say, scoring a goal in a match of the East German premier league 
was a more encouraging event than scoring a goal in a match of the West German league. 
Alternatively, this observation might be interpreted as a stronger tendency of the perhaps 
more professionalized teams of the West German league to switch to a strongly defensive 
mode of play in case of a lead. Consequently, the tails of the distributions are slightly 
fatter for the "Oberliga" than for the "Bundesliga" . Comparing the results for the "Frauen- 
Bundesliga" to those for the "Bundesliga", even more pronounced tails are found for the 
former, resulting in very significantly larger values of the self-affirmation parameter k for 
the matches of the women's league, see the fit parameters collected in Table HTl and the fits 
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of the NBD type presented in Fig. |21 

Considering the fits of the GEV distributions ()2.10p to the data for all three leagues, we 
find that extreme value statistics are in general a reasonably good description of the data. 
The shape parameter ^ is always found to be small in modulus and negative in the majority 
of the cases, indicating a distribution of the Weibull type (which is in agreement with the 
findings of Ref. 0]). On the other hand, fixing ^ = yields overall clearly larger values of 

per degree-of-freedom, indicating that the data are hardly compatible with a distribution 
of the Gumbel type. Comparing "Oberliga" and "Bundesliga" , we consistently find larger 
values of the parameter ^ for the former, indicative of the comparatively fatter tails of these 
data discussed above, see the data in Table B The location parameter /x, on the other hand, 
is larger for the West German league which features a larger average number of goals per 
match (which can be read off also more directly from the A parameter of the Poissonian fits), 
while the scale parameter a is similar for both leagues. Comparing to the results for the 
NBD, we do not find any cases where the GEV distributions would provide the best fit to 
the data, so clearly the leagues considered here are not of the type of the general "domestic" 
league data for which Greenhough et al. [8] found better matches with the GEV than for 
the NBD statistics. Similar conclusions hold true for the comparisons of "Bundesliga" and 
"Frauen-Bundesliga" , with the latter taking on the role of the "Oberliga" . 

Assuming, for the time being, that the histograms of the final scores of the home and 
away teams are properly modeled by the fits presented in Tables H] and |nl it is worthwhile 
as a consistency check to see whether the resulting estimates ()2.2|) and ()2.5|1 of the PDFs 
for the Poisson and negative binomial distributions are consistent with the data for the 
sums and differences. Of course, such consistency can only be expected if the histograms of 
home and away scores are statistically independent, which assumption certainly is a strongly 
simplifying approximation. In Table IIIII we summarize the mean squared deviations of 
the PDFs ()2.2|1 resp. ()2.5|1 . evaluated with the parameters of the fits to the home and away 
scores of Tables HI and ITTj from the data for the sums and differences. While again clearly the 
Poissonian ansatz disqualifies as an acceptable model of the data, the NBD fits the data for 
the "Oberliga" and the "Frauen-Bundesliga" comparatively well, cf. the data in Table HTTl and 
the "total" fits in Figs. ^ and El For the "Bundesliga" , however, significant deviations are 
observed. These deviations might go back to an effect of correlation between the home and 
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away scores. To investigate this question we computed the empirical correlation coefficient, 

Cov(nh, ria) , . 

where cr(n) denotes the square root of the variance of n and Cov(nh,ria) the covariance of 
nh and n^. We find R = -0.015 ± 0.011 for the "Oberliga" and R = -0.031 ± 0.009 for the 
"Bundesliga" , indicating stronger home-away score correlations for the "Bundesliga" . 

In total, the best fits so far are clearly achieved by the NBD ansatz. Since this distribution 
is obtained only as the continuum limit of the microscopic model "A", it is interesting to 
see how fits of the exact distribution (for = 90) resulting from the recurrence ()2.8|) for 
model "A" , but also fits of the multiplicatively modified binomial distribution of model "B" 
compare to the results found above. We perform fits to the exact distributions of both models 
by employing the simplex method to minimize the total of the data for the home and 
away scores. Alternatively, we also considered fitting additionally to the sums and differences 
in a simultaneous fit and found very similar results with an only slight improvement of the 
fit quality for the sums and differences at the expense of somewhat worse fits for the home 
and away scores. We summarize the fit results in Table IIVI We also performed fits to the 
more elaborate model "C", but found rather similar results to the simpler model "B" and 
hence do not present the results here. Comparing the results of model "A" to the fits of 
the limiting NBD, we find almost identical fit qualities for the final scores of both teams. 
However, the sums and differences of scores are considerably better described by model "A" , 
indicating that here the deviations from the continuum limit are still relevant. In Fig. El we 
present the differences of goals in the German women's premier league together with the fits 
of models "A" and "B". The multiplicative model "B", where each goal motivates a team 
even more than the previous one, within the statistical errors yields fits of the same quality 
as model "A" , such that a distinct advantage cannot be attributed to either of them, cf. the 
data in Table HVl 

Finally, to leave the realm of German football, we considered the score data of the "FIFA 
World Cup" series from 1930 to 2002, focusing on the results from the qualification stage 

3400 matches) The results of fits of the phenomenological distributions ()2.1|1 . 

(1231) and ^TW^ as well as the models "A" and "B" are collected in Table El Compared 
to the domestic league data discussed above, the results of the World Cup show distinctly 
heavier tails, cf. the presentation of the data in Fig. HI Considering the fit results, this 
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leads to good fits for tlie lieavy-tailed distributions, and, in particular, in this case the GEV 
distribution provides a better fit than the negative binomial model, similar to what was found 
by Greenhough et al. for some of their data. This difference to the German league data 
discussed above can be attributed to the possibly very large differences in skill between the 
opposing teams occurring since all countries are allowed to participate in the qualification 
round. A glance back to Table |n] reveals a remarkable similarity with the parameters of 
the "Frauen-Bundesliga" (e.g., in both cases the NBD parameters p are comparatively large 
while r is small, and the GEV parameters ^ are positive), where a similar explanation 
appears quite plausible since the very good players are concentrated in two or three teams 
only. Turning to the fits of the models "A" and "B" , we again find model "A" to fit rather 
similar to its continuum approximation, the NBD. On the other hand, model "B" describes 
the data extremely well, for the away team even better than the GEV distributions ()2.10p . 
It is, of course, also possible and interesting to analyze the results from the final round. 
Similar to other cups such as the German "DFB-Pokal" we also considered, the rules are 
slightly different here, since no game can end in a draw, leading to special correlation effects 
in particular in the histograms of the goal differences. These problems will be investigated 
in a forthcoming publication. 



IV. SUMMARY 



We have considered German domestic and international football score data with respect 
to certain phenomenological probability distributions as well as microscopically motivated 
models. The Poisson distribution resulting from the assumption of independent scoring prob- 
abilities for the opposing teams does not provide a satisfactory fit to any of our data. Many 
data sets are rather well described by the negative binomial distribution considered before 
however, some cases have heavier tails than can be accommodated by this distribution 
and, instead, rather follow a distribution from extreme value statistics. 

We have shown that football score data can be understood from a certain class of modified 
binomial models with a built-in effect of self-affirmation of the teams upon scoring a goal. 
The negative binomial distribution fitting many of the data sets can in fact be understood 
as a limiting distribution of our model "A" with an additive update rule of the scoring 
probability. It is found, however, that the exact distribution of model "A" provides in 
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general rather better fits to the data than the hmiting NBD, in particular concerning the 
sums and differences of goals scored. However, it does not provide very good fits in cases 
with heavier tails such as the qualification round of the "FIFA World Cup" series. The 
variant model "B", on the other hand, where a multiplicative update rule ensures that each 
goal motivates the team even more than the previous one, fits these world-cup data as well 
as the data from the German domestic leagues extremely well. Thus, the contradicting 
evidence for better fits of some football score data with negative binomial and other data 
with GEV distributions is reconciled with the use of a plausible microscopic model covering 
both cases. We also analyzed results from further leagues, such as the Austrian, Belgian, 
British, Bulgarian, Czechoslovak, Dutch, French, Hungarian, Italian, Portuguese, Romanian, 
Russian, Scottish and Spanish premier leagues, and arrived at similar conclusions. 

Comparing the score data between the separate German premier leagues during the cold 
war times, we find heavier tails for the East German league. In terms of our microscopic 
models, this corresponds to a stronger component of self-affirmation as compared to the West 
German league. Similarly, the German women's premier league "Frauen-Bundesliga" shows 
a much stronger feedback effect than the men's premier league, with at first sight surprisingly 
many parallels to the "FIFA World Cup" series. In general, we find less professionalized 
leagues to feature stronger components of positive feedback upon scoring a goal, perhaps 
indicating a still stronger infection with the football fever there . . . 

It is obvious that the presented models with a single parameter of self-affirmation are a 
gross over-simplification of the complex psycho-social phenomena on a football pitch. It is all 
the more surprising then, how rather well they model the considered score distributions j^^l ■ 
Naturally, however, a plethora of opportunities for improvement of the description and 
further studies opens up. For instance, considering averages over whole leagues or cups, 
we have not taken into account the differences in skill between the teams. Likewise, if 
time-resolved scoring data were made available, a closer investigation of the intra-team and 
inter-team motivation and demotivation effects would provide an intriguing future enterprise 
to undertake. 
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APPENDIX A: PROBABILISTICS OF CORRELATED BERNOULLI TRIALS 

Consider a series of Bernoulli random variables Ui, i = 1,. . . ,N, with probabilities 
1 — Pi and Pi for the outcomes "0" ("failure") and "1" ("success"), respectively. We are 
interested in the distribution -P/v(X]ili — ^) of the number of successes in A^ trials. For 
the limiting case of equal and constant probabilities Pi = p, i = 1, . . . , N, the Ui are i.i.d. 
random variables and Pn is given by the binomial distribution 

P^(J2u, = n)=( ]p-{l-pf--, (Al) 
i=i V^/ 

which is a properly normalized (discrete) probability distribution function according to the 
binomial theorem. This can be generalized for arbitrary independent choices of probabilities 
Pi- 

We discuss a more general case where, instead, the probabilities pi themselves depend on 
the number of previous successes, pi = p(XlI~i ^i)- Due to the introduced correlations, one 
should then consider the joint probability distribution of the Ui, 

N r i-l i-l "I 

P(f/i, ...,Un) = X{\ piJ2 Uk)5u.,i + [1 - P(E Uk)]5u.fi \ , (A2) 

i=l L k=l k=l ) 

from which the desired distribution of successes follows as the marginal PN{n) = 
-P(f^i, • • • , f^Ar)5j] t/i.n- Instead of formally proceeding from ()A2|) it is more conve- 
nient, however, to observe that the distances Dj between subsequent successes are in- 
dependent geometrically distributed random variables with probabilities 1 — p(n), i.e. 



13 



P{Dj = dj) = p{j)[l —p{j)p ^, J = 0, . . . , n, and the desired marginal distribution becomes 

N-n N—n 

PN{n) = E ■ ■ ■ E [1 -P(0)]'°~V(0) ■ ■ -Pin - \)\\-p[n)f--^b^^,^^^ 

(io = l rfn = l 

n— 1 N—n N—n n 

= n^(^')E---E n[i-p(^')]'^"''^E,'^.^- (A3) 

j=0 do = l dn = l j = 

Manipulating this form it is straightforward to prove a Pascal type recurrence relation 
for the probabilities P]\f{n), 

PM{n) = [l-p{n)]PM-i{n)+p{n - l)PN-i{n - 1), (A4) 

which together with the initial condition Po(0) = 1 and noting that PN{n) = for n > N 
allows to construct the distribution with an 0(A^^) computational effort compared to the 
formal 0(2^) effort implied by Eq. ()A2|) . Multiplying ()A4|) by and summing over all n, 
one arrives at 

Gn{u) - Gn-i{u) = {u- 1)Hn-i{u), (A5) 

where 

CO oo 

Gn{u) = J2PN{n)u^, Hr,{u) = EpH^tvWu", (A6) 

n=0 n=0 

such that Gn{u) is the generating function of PN{n). The continuum limit N \—>- t is thus 
described by the differential equation 

^-^ = (u-imu,t). (A7) 

The additive, correlated binomial model discussed in the main text modifies p h-^ p + k 
on each success, unless p + /t > 1 in which case p ^ 1. Restricting ourselves to the range 
of parameters where p < 1, we have p{n) = po + nn, Hm{u) = pqGn{u) + ku-^G n{u) and 
Eq. ()A7|1 becomes 

= {u- l)[poG{u,t) + Ku^G{u,t)l (A8) 
which is readily checked to be solved by 

G{u, t) = [e^' - M(e"* - (A9) 



Hence, P{n) has a negative binomial distribution |9l l23l|. 



nlT[pQ/K) ^ ^ \ n ' 
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where r = p^/n and p = 1 — e"*^*. For Nk, = const < 1, this continuum approximation is 
appropriate in the same hmit where the Poissonian distribution is a vahd approximation for 
the binomial distribution ()A1|) . i.e., for ^ 1 with Npo = const. 

For the muhiphcative, correlated binomial model, after each success the probability is 
modified as p ^ up (unless up > 1, in which case p ^ 1), such that p{j) = PqK^ for the 
range of parameters where p{n) < 1. In this case, the differential equation ()A7|) becomes 

^^^ = (u-l)poG{Ku,t). (All) 

Note that due to the different first arguments of G, this is not an ordinary differential 
equation. We currently do not see how the solution could be expressed in terms of elementary 
or special functions in this case. Still, the distribution P/v(n) can be easily computed from 
the recurrence ()A4|1 j^^. 

Finally, for the case of two coupled, correlated binomial distributions with probabilities 
Pa for "success A" , ps for "success B" and {l—pA—ps) for "failure", similar considerations 
lead to a recurrence relation 

PNinA^ns) = [l-PAinA,nB) -pB{nA,nB)]PN-iinA,nB) + 
PA^riA - l,nB)PN~i{nA - l,nB) + 

PsinA, nB - l)PN^i{nA, - 1), (A12) 

from which the distributions Pj^iIuajTIb) for the model variants "A", "B" and "C" can be 
easily computed in 0(A^^) time. 
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TABLE I: Fits of the phenomenological distributions (EHJ, and ^TH^ to the data for the 

East German "Oberhga" between 1949/50 and 1990/91 and for the West German "Bundeshga" 
for the seasons of 1963/64 - 1990/91. 







Oberhga 


Bundeshga 






Home 


Away 


Home 


Away 


Poisson 


A 


1.85 ±0.02 


1.05 ±0.01 


2.01 ±0.02 


1.17±0.01 




xVd.o.f. 


12.5 


12.8 


6.53 


7.31 


NED 


P 


0.17 ±0.01 


0.14 ±0.01 


0.11 ±0.01 


0.10 ±0.01 




r 


9.06 ± 0.88 


6.90 ± 0.84 


15.9 ±2.10 


11.3 ± 1.84 




Po 


0.0191 


0.0112 


0.0213 


0.0126 




K 


0.0021 


0.0016 


0.0013 


0.0011 




xVd.o.f. 


0.99 


4.09 


0.68 


2.29 


GEV 




-0.05 ±0.01 


0.02 ±0.01 


-0.09 ±0.01 


-0.01 ±0.01 






1.12 ±0.02 


0.49 ± 0.02 


1.28 ±0.02 


0.58 ±0.02 




a 


1.30 ±0.02 


0.90 ± 0.02 


1.36 ±0.02 


0.96 ±0.02 




xVd.o.f. 


1.93 


5.04 


1.83 


4.74 


Gumbel 




1.12 ±0.02 


0.48 ± 0.02 


1.28 ±0.02 


0.59 ±0.01 




a 


1.25 ±0.01 


0.92 ±0.01 


1.25 ±0.01 


0.95 ±0.01 




xVd.o.f. 


4.13 


4.65 


12.9 


4.06 
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TABLE II: Fits of the phenomenological distributions (EHJ), and ^TTU^ to the data for the 

German men's premier league "Bundeshga" between 1963/64 and 2004/05 and for the German 
women's premier league "Frauen-Bundesliga" for the seasons of 1997/98 - 2004/05. 







Bundesliga 


Frauen-Bundesliga 






Home 


Away 


Home 


Away 


Poisson 


A 


1.91 ±0.01 


1.16 ±0.01 


1.78 ±0.04 


1.36 ±0.04 




xVd.o.f. 


9.21 


9.13 


14.6 


14.4 


NBD 


P 


nil + n ni 


n no + n m 


n 45 + n qq 


n 4« + n nQ 




r 


16.24 zb 1.82 


12.08 ± 1.69 


2.38 ±0.24 


1.97 ±0.22 




Po 


0.0202 


0.0125 


0.0160 


0.0133 




K 


0.0012 


0.0010 


0.0067 


0.0068 




xVd.o.f. 


1.08 


2.22 


2.32 


1.37 


GEV 




-0.10 ±0.01 


-0.02 ±0.01 


0.04 ±0.04 


0.25 ±0.07 






1.17 ±0.02 


0.57 ±0.01 


0.83 ±0.08 


0.77 ±0.07 




a 


1.33 ±0.01 


0.96 ±0.01 


1.49 ±0.06 


1.18 ±0.05 




xVd.o.f. 


3.43 


7.95 


3.40 


1.55 


Gumbel 




1.18 ±0.01 


0.58 ±0.01 


0.81 ±0.08 


0.58 ±0.07 




a 


1.21 ±0.01 


0.94 ± 0.01 


1.53 ±0.05 


1.31 ±0.05 




xVd.o.f. 


24.5 


7.26 


3.17 


4.09 
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TABLE III: Matching of the sums and differences of goals. Fits were performed to the home and 
away score distributions only and mean-squared deviations were computed for the distributions of 
sums and differences from Eqs. I\2.'2\i and ()2.5|) with the thus found parameters Xh and \a resp. ph, 
rh, Pa and ra- 







jjunuesnga u4i/uo 


-Dunaesnga yu/yi 


Oberliga 


Women 


Poisson 




1.91 ± 0.01 


2.01 ± 0.02 


1.85 ± 0.02 


1.78 ± 0.04 




Xa 


1.16 ± 0.01 


1.17 ± 0.01 


1.05 ± 0.01 


1.36 ± 0.04 


Home 




9.21 


6.53 


12.5 


14.6 


Away 


Xl/d.ol. 


9.13 


7.31 


12.8 


14.4 


Total 


x|/d.o.f. 


10.7 


15.9 


16.3 


10.4 


Difference XA/d.o.f. 


67.6 


578 


474 


20.2 


NBD 


Ph 


0.11 ±0.01 


0.11 ±0.01 


0.17 ±0.01 


0.45 ± 0.03 




rh 


16.24 ± 1.82 


15.9 ±2.10 


9.06 ±0.82 


2.38 ±0.24 




Pa 


0.09 ±0.01 


0.10 ±0.01 


0.14 ±0.01 


0.46 ± 0.03 




ra 


12.08 ± 1.69 


11.3 ±1.84 


6.90 ± 0.84 


1.97 ±0.22 


Home 


Xl/d.ol. 


1.08 


0.68 


0.99 


2.32 


Away 


Xl/d.ol. 


2.22 


2.29 


4.09 


1.37 


Total 


x|/d.o.f. 


25.1 


17.3 


8.31 


18.9 


Difference XA/d-o.f. 


23.9 


18.0 


7.16 


3.55 
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TABLE IV: Fit results for models "A" and "B" . Fits were performed to the score distributions of 
the home and away teams only and the resulting model estimates for the sums and differences of 
goals compared to the data. 







Bundesliga 04/05 


Bundesliga 90/91 


Oberliga 


Women 


Model "A" 


Po,h 


0.0199 ± 0.0002 


0.0210 ± 0.0002 


0.0188 ± 0.0002 


0.0159 ± 0.0005 






0.0015 ±0.0001 


0.0016 ±0.0002 


0.0024 ± 0.0002 


0.0070 ± 0.0005 




PO,a 


0.0125 ± 0.0002 


0.0125 ± 0.0001 


0.0112 ± 0.0001 


0.0132 ± 0.0004 




Kg, 


0.0012 lb 0.0001 


0.0013 ± 0.0002 


0.0018 ± 0.0002 


0.0071 ± 0.0007 


Home 


9/1 r 

Xh/d-o.f. 


1.01 


0.68 


1.07 


2.28 


Away 


Oil f 

Xa/d.O.f. 


2.31 


2.37 


4.23 


1.44 


Total 


9/1 £ 

Xs/d.o.f. 


16.6 


11.5 


5.33 


12.4 


Difference 


2 /J f 


18.6 


14.0 


5.63 


2.86 


Model "B" 


PO,h 


0.0200 ± 0.0002 


0.0211 ± 0.0002 


0.0189 ± 0.0002 


0.0166 ± 0.0005 




«h 


1.0679 ± 0.0060 


1.0695 ± 0.0072 


1.1115 ±0.0083 


1.3146 ±0.0303 




PO,a 


0.0125 ±0.0001 


0.0125 ±0.0002 


0.0112 ±0.0001 


0.0138 ± 0.0004 




fva 


1.0932 ± 0.0106 


1.1015 ±0.0124 


1.1526 ±0.0149 


1.4115 ±0.0543 


Home 


X^/d.o.f. 


1.25 


0.71 


0.75 


3.24 


Away 


X^/d.o.f. 


1.96 


2.02 


3.35 


0.95 


Total 


Xl/d.o.f. 


16.9 


11.8 


5.40 


13.5 


Difference 


Xi/d.o.f. 


18.4 


13.8 


5.26 


2.82 
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TABLE V: Fit results for the qualification phase of the "FIFA World Cup" series from 1930 to 
2002. 







Home 


Away 


Poisson 


A 


1.53 ± 0.02 


0.89 ± 0.01 






18.6 


25.0 


NBD 


P 


0.37 ±0.02 


0.38 ± 0.02 




r 


3.04 ± 0.21 


1.76 ±0.12 




Po 


0.0154 


0.0094 




K 


0.0051 


0.0053 




xVd.o.f. 


2.67 


2.02 


GEV 




0.11 ±0.02 


0.19 ±0.02 






0.86 ± 0.03 


0.36 ± 0.03 




a 


1.21 ± 0.03 


0.86 ± 0.02 




xVd.o.f. 


0.85 


1.89 


Gumbel 




0.80 ± 0.03 


0.25 ± 0.03 




a 


1.31 ± 0.02 


0.94 ± 0.02 




xVd.o.f. 


3.29 


12.9 


Model "A" 


Po 


0.0152 ± 0.0003 0.0093 ± 0.0002 




K 


0.0053 ± 0.0003 0.0055 ± 0.0003 




xVd.o.f. 


2.88 


2.19 


Model "B" 


Po 


0.0155 ± 0.0002 0.0095 ± 0.0002 




K 


1.2780 ± 0.0130 1.4775 ± 0.0343 




xVd.o.f. 


0.92 


0.80 



22 




2 4 6 8 10 12 2 4 6 8 10 12 

goals goals 

FIG. 1: Probability density of goals scored by home teams, away teams, and of the total number 
of goals scored in the match. Left: "Oberliga" of the GDR between 1949 and 1990. Right: 
"Bundesliga" of the FRO in the seasons of 1963/64 - 1990/91. The lines for "home" and "away" 
show fits of the negative binomial distribution 1)2. 4|1 to the data; the line for "total" denotes the 
resulting distribution of the sum, Eq. (|2.5j) . 




FIG. 2: Probability density of goals scored in the German premier league "Bundesliga" for all 
seasons (left) and in the women's "Frauen-Bundesliga" (right). 
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FIG. 3: Goal differences in tlie German women's premier league together with fits of models "A' 
and "B". 
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FIG. 4: Probability density of goals scored by the home and away teams in the qualification stage 
of the "FIFA World Cup" series from 1930 to 2002. 
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